Privacy Protection from Sampling and Perturbation in Survey Microdata
نویسندگان
چکیده
Statistical agencies release microdata from social surveys as public-use files after applying statistical disclosure limitation (SDL) techniques. Disclosure risk is typically assessed in terms of identification risk, where it is supposed that small counts on cross-classified identifying key variables, i.e., a key, could be used to make an identification and confidential information may be learnt. In this paper we explore the application of definitions of privacy from the computer science literature to the same problem, with a focus on sampling and a form of perturbation which can be represented as misclassification. We consider two privacy definitions: differential privacy and probabilistic differential privacy. Chaudhuri and Mishra (2006) have shown that sampling does not guarantee differential privacy, but that, under certain conditions, it may ensure probabilistic differential privacy. We discuss these definitions and conditions in the context of survey microdata. We then extend this discussion to the case of perturbation. We show that differential privacy can be ensured if and only if the perturbation employs a misclassification matrix with no zero entries. We also show that probabilistic differential privacy is a viable alternative to differential privacy when there are zeros in the misclassification matrix. We discuss some common examples of SDL methods where in some cases zeros may be prevalent in the misclassification matrix.
منابع مشابه
When Excessive Perturbation Goes Wrong and Why IPUMS-International Relies Instead on Sampling, Suppression, Swapping, and Other Minimally Harmful Methods to Protect Privacy of Census Microdata
IPUMS-International disseminates population census microdata at no cost for 69 countries. Currently, a series of 212 samples totaling almost a half billion person records are available to researchers. Registration is required for researchers to gain access to the microdata. Statistics from Google Analytics show that IPUMS-International's lengthy, probing registration form is an effective deterr...
متن کاملProvably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy
Privacy-preserving microdata publishing currently lacks a solid theoretical foundation. Most existing techniques are developed to satisfy syntactic privacy notions such as k-anonymity, which fails to provide strong privacy guarantees. The recently proposed notion of differential privacy has been widely accepted as a sound privacy foundation for statistical query answering. However, no general p...
متن کاملMicroaggregation for Protecting Individual Data Privacy
Microaggregation is a technique for protecting the privacy of respondents in individual data (microdata) releases. This papers starts with a survey of the general definitions and concepts related to microdata protection and then reviews the state of the art of microaggregation, to which our group has substantially contributed.
متن کاملGlobal Disclosure Risk Measures and k-Anonymity Property for Microdata
In today’s world, governmental, public, and private institutions systematically release data which describes individual entities (commonly referred as microdata). Those institutions are increasingly concerned with possible misuses of the data that might lead to disclosure of confidential information. Moreover, confidentiality regulation requires that privacy of individuals represented in the re...
متن کاملStatistical Disclosure Control for Data Privacy Preservation
With the phenomenal change in a way data are collected, stored and disseminated among various data analyst there is an urgent need of protecting the privacy of data. As when individual data get disseminated among various users, there is a high risk of revelation of sensitive data related to any individual, which may violate various legal and ethical issues. Statistical Disclosure Control (SDC) ...
متن کامل